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Abstract. We study the convergence speed of distributed iterative algorithms for the con- 
sensus and averaging problems, with emphasis on the latter. We first consider the case of a fixed 
communication topology. We show that a simple adaptation of a consensus algorithm leads to an 
averaging algorithm. We prove lower bounds on the worst-case convergence time for various classes 
of linear, time-invariant, distributed consensus methods, and provide an algorithm that essentially 
matches those lower bounds. We then consider the case of a time-varying topology, and provide a 
polynomial-time averaging algorithm. 

1. Introduction. Given a set of autonomous agents — which may be sensors, 
nodes of a communication network, cars, or unmanned aerial vehicles — the dis- 
tributed consensus problem asks for a distributed algorithm that the agents can use 
to agree on an opinion (represented by a scalar or a vector), starting from different 
initial opinions among the agents, and in the presence of possibly severely restricted 
communications. 

Algorithms that solve the distributed consensus problem provide the means by 
which networks of agents can be coordinated. Although each agent may have access 
to different local information, the agents can agree on a decision (e.g., on a common 
direction of motion, on the time to execute a move, etc.). Such synchronized behavior 
often been observed in biological systems [To] . 

The distributed consensus problem has historically appeared in many diverse 
areas, such as parallel computation [30l[3lJ__], control theory [El [28], and commu- 
nication networks [231 [___] ■ Recently, the problem has attracted significant attention 
[TiliaillTTlIlliaiTlIigiMIIliaiBII], motivated by new contexts and open prob- 
lems in communications, sensor networks, and networked control theory. We briefly 
describe some more of the more recent applications. 

Reputation management in ad hoc networks: It is often the case that the nodes 
of a wireless multi-hop network are not controlled by a single authority or do not have 
a common objective. Selfish behavior among nodes (e.g., refusing to forward traffic 
meant for others) is possible, and some mechanism is needed to enforce cooperation. 
One way to detect selfish behavior is reputation management: each node forms an 
opinion by observing the behavior of its neighbors. One is then faced with the prob- 
lem of combining these different opinions into a single globally available reputation 
measure for each node. The use of distributed consensus algorithms for doing this 
was explored in |22] , where a variation of one of the methods we examine here — the 
"agreement algorithm" — was used as a basis for an empirical investigation. 
Sensor networks: A sensor network designed for detection or estimation needs to 
combine various measurements into a decision or into a single estimate. Distributed 
computation of this decision/estimate has the advantage of being fault-tolerant (net- 
work operation is not dependent on a small set of nodes) and self-organizing (network 
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functionality does not require constant supervision) [311 [5J [TT] . 

Control of autonomous agents: It is often necessary to coordinate collections of 
autonomous agents (e.g., cars or UAVs). For example, one may wish for the agents to 
agree on a direction or speed. Even though the data related to the decision may be 
distributed through the network, it is usually desirable that the final decision depend 
on all the known data, even though most of them are unavailable at each node. A 
model motivated by such a context was empirically investigated in [32] . 

In this paper, we focus on a special case of the distributed consensus problem, the 
distributed averaging problem. Averaging algorithms guarantee that the final global 
value will be the exact average of the initial individual values. Our general objective 
is to characterize the worst-case convergence time of various averaging algorithms, as 
a function of the number n of agents, and to understand their fundamental limitations 
by providing lower bounds on the convergence time. 

We now outline the remainder of this paper, and preview the main contributions. 
In Section [2] we provide some background material, by reviewing the agreement algo- 
rithm of 30, 31] for the distributed consensus problem. In Sections |3]0 we consider 
the case of fixed graphs. In Section [3] we discuss three different ways that the agree- 
ment algorithm can provide a solution to the averaging problem. In particular, we 
show how an averaging algorithm can be constructed based on two parallel executions 
of the agreement algorithm. In Section [4] we define the notions of convergence rate 
and convergence time, and provide a variational characterization of the convergence 
rate. 

In Section[Sl we use results from [53] to show that the worst-case convergence time 
of an averaging algorithm introduced in Section [3J is essentially 9(n 3 )0 In Section [6] 
we show that for one of our methods, the convergence rate can be made arbitrarily fast. 
On the other hand, under an additional restriction that reflects numerical stability 
considerations, we show that the convergence time of a certain class of algorithms (and 
by extension of a certain class of averaging algorithms) is Q(n 2 ), in the worst-case. 
We also provide a simple method (based on executing the agreement algorithm on a 
spanning tree) whose convergence time essentially matches the n(n 2 ) lower bound. 
In Section [7] we discuss briefly particular methods that employ doubly stochastic 
matrices and their potential drawbacks. 

Then, in Section [8l we turn our attention to the case of dynamic topologies. 
For the agreement algorithm, we show that its convergence time for the case of non- 
symmetric topologies can be exponentially large in the worst case. On the other 
hand, for the case of symmetric topologies, we provide a new averaging algorithm 
(and therefore, an agreement algorithm as well), whose convergence time is 0(n 3 ). 
To the best our knowledge, none of the existing consensus or averaging algorithms 
in the literature, has a similar guarantee of polynomial time convergence in the pres- 
ence of dynamically changing topologies. In Section [9] we report on some numerical 
experiments illustrating the advantages of two of our algorithms. Section [TOl contains 
some brief concluding remarks. 

2. The agreement algorithm. The "agreement algorithm" is an iterative pro- 
cedure for the solution of the distributed consensus problem. It was introduced in [10| 
for the time-invariant case, and in [30, 31 for the case of "asynchronous" and time- 



Let / and g be two positive functions on the positive integers. We write /(n) = 0(g(n)) 
[respectively, /(n) = Q(g(n))] if there exists a positive constant c and some no such that f(n) < cg(n) 
[respectively, f(n) > cg(n)], for all n > no- If /(") = 0(g(n)) and /(n) = Q(g(n)) both hold, we 
write /(n) = B(g(n)). 
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varying environments. We briefly review this algorithm and summarize the available 
convergence results. 

Consider a set Af = {1,2, ... ,n} of nodes. Each node i starts with a scalar 
value Xi(0); the vector with the values of all nodes at time t is denoted by x(t) = 
(xi(t), . . . ,x n (t)). The agreement algorithm updates x(t) according to the equation 
x(t + 1) = A{t)x{t), or 



where A(t) is a nonncgative matrix with entries ay(t). The row-sums of A(t) are 
equal to 1, so that A(t) is a stochastic matrix. In particular, Xi(t + 1) is a weighted 
average of the values Xj (t) held by the nodes at time t. 

We next state some conditions under which the agreement algorithm is guaranteed 
to converge. 

Assumption 2.1. There exists a positive constant a such that: 

(a) au(t) > a, for all i, t. 

(b) a lj (t) £ {0} U [a, 1], for all i, j, t. 

( c ) X)"=i a y(*) = 1> f° r al1 h *■ 

Intuitively, whenever atj(t) > 0, node j communicates its current value Xj(t) to 
node i. Each node i updates its own value, by forming a weighted average of its own 
value and the values it has just received from other nodes. We represent the sequence 
of communications between nodes by a sequence G(t) = (Af,£(t)) of directed graphs, 
where (j,i) £ £(t) if and only if ciy (i) > 0. Note that (i,i) £ £(t) for all t, and this 
condition will remain in effect throughout the paper. 

Our next assumption requires that following an arbitrary time t, and for any i, j, 
there is a sequence of communications through which node i will influence (directly 
or indirectly) the value held by node j. 

Assumption 2.2 (Connectivity). For every t > 0, the graph (j\f,U s > t £(s)) is 
strongly connected. 

Assumption 1 2 . 21 by itself is not sufficient to guarantee consensus (see Exercise 3.1, 
in p. 517 of [3]). This motivates the following stronger version. 

Assumption 2.3 (Bounded interconnectivity times). There is some B such that 
for all k, the graph (j\f,£(kB) U £(kB + 1) U • • • U £((k + l)B - 1))) is strongly 



We note various special cases of possible interest. 
Time-invariant model: In this model, introduced by DeGroot |10j . the set of arcs £(t) 
is the same for all t; furthermore, the matrix A(t) is the same for all t. In this case, we 
are dealing with the iteration x := Ax, where A is a stochastic matrix; in particular, 
x{t) = A t x(0). Under Assumptions 12.11 and 12 . 2\ A is the transition probability matrix 
of an irreducible and aperiodic Markov chain. Thus, A 1 converges to a matrix all 
of whose rows are equal to the (positive) vector 7r = (tti, . . . , 7r„) of steady-state 
probabilities of the Markov chain. Accordingly, we have lim^oo Xi(t) = ""i^i(O). 
Bidirectional model: In this case, we have £ £(t) if and only if (j, i) £ £{t), and 
we say that the graph G is symmetric. Intuitively, whenever i communicates to j, 
there is a simultaneous communication from j to i. 
Equal-neighbor model: Here, 



n 



Xi(t+ 1) = y^a i: j(t)xj(t), 



3 = 1 



connected. 
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where jVj(i) = {j | (j, i) € £(£)} is the set of nodes j (including i) whose value is 
taken into account by i at time t, and di(t) is its cardinality. This model is a linear 
version of a model considered by Vicsek et al. [35]. Note that here the constant a of 
Assumption 12.11 can be take to be 1/n. 

Theorem 2.4. Under Assumptions [Ql and [Ql the agreement algorithm guar- 
antees asymptotic consensus, that is, there exists some c (depending on x(0) and on 
the sequence of graphs G(-) ) such that limt_,oo Xi(t) = c, for all i. 

Theorem [23] is presented in [31j and proved in [30] . in a more general setting that 
allows for communication delays, under a slightly stronger version of Assumption 
12.31 see also Ch. 7 of [3] , and [3T] [3] for extensions to the cases of communication 
delays and probabilistic dropping of packets. The above version of Assumption 12.31 
was introduced in [18J . Under the additional assumption of a bidirectional model, the 
bounded intcrconnectivity time assumption is unnecessary, as established in |20[ [6] 
for the bidirectional equal-neighbor model, and in |17[ 125) for the general case. 

3. Averaging with the agreement algorithm in fixed networks. In this 
section, as well as in Sections @][8l we assume that the network topology is fixed, i.e., 
G(t) = G for all t, and known. We consider the time-invariant version, x := Ax, 
of the agreement algorithm, and discuss various ways that it can be used to solve 
the averaging problem. We show that an iteration x := Ax that solves the consensus 
problem can be used in a simple manner to provide a solution to the averaging problem 
as well. 

3.1. Using a doubly stochastic matrix. As remarked in Section [2] with the 
time-invariant agreement algorithm x := Ax, we have 



where TTi is the steady-state probability of node i in the Markov chain associated 
with the stochastic matrix A. It follows that we obtain a solution to the averaging 
problem if and only if 7r, = 1/n or every i. Since tt is a left eigenvector of A, with 
eigenvalue equal to 1, this requirement translates to the property 1 T A = 1 T , where 
1 is the vector with all components equal to 1. Equivalcntly, the matrix A needs 
to be doubly stochastic. A particular choice of a doubly stochastic matrix has been 
proposed in [57] (see also [5J); it is discussed further in Sections [7J and [5] 

3.2. The scaled agreement algorithm. Suppose that the graph G is fixed a 
priori and that there is a system designer or other central authority who chooses a 
stochastic matrix A offline, computes the associated steady-state probability vector 
(assumed unique and positive), and disseminates the value of niTi to each node i. 

Suppose next that the nodes execute the agreement algorithm x := Ax, using the 
matrix A, but with the initial value Xi(0) of each node i replaced by 



n 



(3.1) 




(3.2) 



Si(O) 



a*(0) 



Then, the value Xi(t) of each node i converges to 



and we therefore have a valid averaging algorithm. This establishes that any (time- 
invariant) agreement algorithm for the consensus problem translates to an algorithm 
for the averaging problem as well. There arc two possible drawbacks of the scheme 
we have just described: 

(a) If some of the mr, are very small, then some of the initial Ti (0) will be very 
large, which can lead to numerical difficulties |16j . 

(b) The algorithm requires some central coordination, in order to choose A and 
compute 7T. 

The algorithm provided in the next subsection provides a remedy for both of the 
above drawbacks. 

3.3. Using two parallel passes of the agreement algorithm. Given a fixed 
graph G, let A be the matrix that corresponds to the time- invariant, equal-neighbor, 
bidirectional model (see Section [5] for definitions); in particular, if € £, then 

(j, i) G £, and ay = l/e?i, where rf, is the cardinality of M%. Under Assumptions 
12.11 and 12. 21 the stochastic matrix A is irreducible and aperiodic (because an > for 
every i). Let E = X)"=i ^ * s easuv verified that the vector ir with components 
7T,; = di/E, satisfies ir T = ir T A, and is therefore equal to the vector of steady-state 
probabilities of the associated Markov chain. 

The following averaging algorithm employs two parallel runs of the agreement 
algorithm, with different, but locally determined, initial values. 

Algorithm 3.1. 

(a) Each node i sets j/i(0) = 1/di and Zj(0) = Xi(0)/dj. 

(b) The nodes run the agreement algorithms y(t+l) = Ay(t) and z(t+l) = Az(t). 

(c) Each node sets Xi(t) = Zi(t)/yi(t). 

We have 

n n d \ 

i—l 2—1 

and 

to zm =e^(o) = £§ • -f 1 = |E^(°)- 

i—l i—l i—l 

This implies that 

1 " 

lim Xi(t) = - a;j(0), 
t^oo n z — ' 

i=l 

i.e., we have a valid averaging algorithm. Note that the iteration y := Ay need not 
be repeated if the network remains unchanged and the averaging algorithm is to be 
executed again with different initial opinions. Finally, if n and E are known by all 
nodes, the iteration y := Ay is unnecessary, and we could just set yi{t) = n/E. 

4. Definition of the convergence rate and the convergence time. The 

convergence rate of any of the algorithms discussed in Section [3] is determined by 
the convergence rate of the matrix powers A . In this section, we give a definition 
of the convergence rate (and convergence time) and provide a tool for bounding the 
convergence rate. As should be apparent from the discussion in Section [31 there is no 
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reason to restrict to doubly stochastic matrices, or even to nonnegativc matrices. We 
therefore start by specifying the class of matrices that we will be interested in. 

Consider a matrix A with the following property: for every x(0), the sequence 
generated by letting x(t + l) = Ax(t) converges to cl, for some scalar c. Such a matrix 
corresponds to a legitimate agreement algorithm, and can be employed in the scheme 
of Section 3.2 to obtain an averaging algorithm, as long as 1 is an eigenvalue of A 
with multiplicity 1, and the corresponding left eigenvector, denoted by 7r, has nonzero 
entries. Because of the above assumed convergence property, all other eigenvalues 
must have magnitude less than 1. Note, however, that we allow A to have some 
negative entries. 

Suppose that A has the above properties. Let 1 = Ai, A2, . . • , A n , be the eigen- 
values of A, sorted in order of decreasing maginitude. We also let X be the set of 
vectors of the form cl, i.e., with equal components. Given such a matrix A, we define 
its convergence rate, p, by 

\\x(t) - x*\\ 2 \Vt 



(4.1) p = sup lim . 

x(o)£x*^°° v F(0) 

where x* stands for limt—xx, x(t). As is well known, we have p = max{|A2|, |A„|}. 
We also define the convergence time T n {e), by 

r.W-^{r= jgg:^j- S«. V.>r,»«(»)<4 

Although we use the infinity norm to define the convergence time, bounds for other 
norms can be easily obtained from our subsequent results, using the equivalence of 
norms. 

Under the above assumptions, a result from [33] states 

p = max{|A 2 |, |A„|}. 

To study p, therefore, we must develop techniques to bound the eigenvalues of the 
matrix A. To this end, we will be using the following result from [23] . We present 
here a slightly more general version, and include a proof, for completeness. 

Theorem 4.1. Consider annxn matrix A and let Ai, A2, . . . , A n , be its eigenval- 
ues, sorted in order of decreasing maginitude. Suppose that the following conditions 
hold. 

(a) We have Ai = 1 and Al = 1. 

(b ) There exists a positive vector it such that tt t A = n T . 

(c) For every i and j, we have 7TjCiy = ftjCiji. 

Let 

n n 

S = ja: j n i x i = Oj KiX i = - 1 } 

i=l t=l 
Then, all eigenvalues of A are real, and 

^ n n 

(4.2) A 2 = 1 - -nrm^^TTia^-tx; - xj) 2 . 

«=1 J=l 



In particular, for any vector y that satisfies Yn=i ^iV* = ®> we have 



y 3 f 



(4.3) A 2 > 1 - 



Proof. Let D be a diagonal matrix whose ith diagonal entry is Wi- Condition (c) 
yields DA = A T D. We define the inner product (-,-)^ by (x,y)^ = x T Dy. We then 
have 

(x, Ay) n = x T DAy = x T A T By = (Ax, y) v . 

Therefore, A is self-adjoint with respect to this inner product, which proves that A 
has real eigenvalues. 

Since the largest eigenvalue is 1, with an eigenvector of 1, we use the variational 
characterization of the eigenvalues of a self-adjoint matrix (Chapter 7, Theorem 4.3 
of [29]) to obtain 

A2 = max(x, Ax)t; 

n n 

= max TTiOijXiXj 



2 ies 

1=1 j=i 



For x € S, we have 



^^^iOijiXi +x 2 j) = 2^ ^71-^2^ = 2 J~] -Kjxl = 2(x,x)ir =2, 

i— 1 j — 1 i—l j—1 i—1 

which yields 

n n 

A 2 = 1 - - m| n X] XI n * a v( x i ~ x i) 2 - 
i=i i=i 



Finally, Eq. (|4.3[) follows from (|4.2[) by considering the vector Xi = 2/i/y(2j=i 71 jVj)- 
□ 

Note that the bound of Eq. (|4.3j) does not change if we replace the vector y with 
ay, for any q^O. 

5. Convergence time for the algorithm of Section l3.3l For the equal neigh- 
bor, time- invariant, bidirectional model, tight bounds on the convergence rate were 
derived in [23] . 

Theorem 5.1. Jffffl Consider the equal-neighbor, time-invariant, bidirectional 
model, on a connected graph with n nodes. The convergence rate satisfies 

p<l- 7i«~ 3 , 



where ji is a constant independent of n. Moreover, there exists some 72 > such 
that for every positive integer n, there exists an n-node connected symmetric graph 
for which 

p>l — j2n~ 3 . 



Theorem 15. II is proved in |23j for the case of symmetric graphs without self-arcs. 
It is not hard to check that essentially the same proof goes through when self-arcs 
are present, the only difference being in the values of the constants 71 and 72. This 
is intuitive because the effect of the self-arcs is essentially a "slowing down" of the 
Markov chain by a factor of at most 2, and therefore the convergence rate should stay 
the same. 

Using some additional results on random walks, Theorem 15.11 leads to a tight 
bound (within a logarithmic factor) on the convergence time. 

Corollary 5.2. The convergence time for the equal-neighbor, time-invariant, 
symmetric model on a connected graph on n nodes, satisfied 

T„(e) = 0(n 3 log(n/e)) 

Furthermore, for every positive integer n, there exists a n-node connected graph for 
which 

T n (e) =tt(n 3 log(l/e)). 



Proof. The matrix A is the transition probability matrix for a random walk on 
the given graph, where given the current state i, the next state is equally likely to be 
any of its neighbors (including i itself). Let Pij(t) be the (i, j)th entry of the matrix 
AK It is known that (Theorem 5.1@of [2"T]). 

(5.1) |fti(*)-^l<^Jp*- 
Since 1 < di and dj < n, we have 

\Pij(t) — 7Tj-| < Vnp\ 
for all i, j, and t. Using the result of Theorem 15. 11 we obtain 

(5.2) IPij^-^l^M^-n-y. 

This implies that be taking t = cn 3 log(n/e), where c is a sufficiently large absolute 
constant, we will have \pi 3 ;(r) —Tfj\ < c/ti for all i, j, and r >t. 

Let A* = lim t _ >00 vl t , and x* = ]init- too A t x(0). Note note that A*x(0) — x* = 
A l x* = A*x* , for all t. Then, with t chosen as above, 

||s(t)-x*||oo = ||A'(.t(0)-.t*)|| oo 

= \\(A t -A*)(x(0)-x*)\\oo 

< \\A*-A*\\i- \\X(0) - x'Woe 
<c||l(0)-I*||oo. 



2 Throughout, log will stand for the base-2 logarithm. 

3 Theorem 5.1 of 1211 is proved for symmetric graphs without self-arcs. However, the proof docs 
not use the absence of self-arcs, and when they are present the same proof yields the same result. 
Wc refer the reader to the derivation of Eq. (3.1) in 1211 for details. 



This establishes the upper bound on T„(e). 

For the lower bound, note that for every (i, j) £ £, we have 7rjay = [di/ E){1/ dj) = 
l/E, so that condition (c) in Theorem 15.11 is satisfied. It follows that A has real 
eigenvalues. Let x(0) be a (real) eigenvector of A corresponding to the eigenvalue p. 
Then, x{t) = A t x(0) = p t x(0), which converges to zero, i.e., x* = 0. We then have 



By the second part of Theorem 15.11 there exists a graph for which p > 1 — 7rt~ 3 , 
leading to the inequality T„(e) > cn 3 log(l/e), for some absolute constant c. □ 

The n(n 3 ) convergence time of this algorithm is not particularly attractive. In 
the next section, we explore possible improvements in the convergence time by using 
different choices for the matrix A. 

6. Convergence time for the scaled agreement algorithm. In this section, 
we consider the scaled agreement algorithm introduced in Section 13.21 As in [33] , we 
assume the presence of a system designer who chooses the matrix A so as to obtain a 
favorable convergence rate, subject to the condition ay = whenever £■ The 

latter condition is meant to represent the network topology through which the nodes 
are allowed to communicate. Our aim is to characterize the best possible convergence 
rate guarantee. We will see that the convergence rate can be brought arbitrarily 
close to zero. However, if we impose a certain "numerical stability" requirement, the 
convergence time becomes ft(n 2 log(l/e)), for a worst-case choice of the underlying 
graph. Furthermore, this worst-case lower bound applies even if we allow for matrices 
A in a much larger class than that considered in [33] . Finally, we will show that a 
convergence time of 0(n 2 log(n/e)) can be guaranteed in a simple manner, using a 
spanning tree. 

6.1. Favorable but impractical convergence rates. In this section, we show 
that given a connected symmetric directed graph G = (A/ - , £), there is an elementary 
way of choosing a stochastic matrix A for which p is arbitrarily close to zero. 

We say that a directed graph is a bidirectional spanning tree if (a) it is symmetric, 
(b) it contains all self-arcs (z, i), and (b) if we delete the self-arcs, ignore the orientation 
of the arcs and remove duplicate arcs, we are left with a spanning tree. 

Without loss of generality, we assume that G is a bidirectional spanning tree; 
since G is symmetric and connected, this amounts to deleting some of its arcs, or, 
equivalently, setting Oy = for all deleted arcs 

Pick an arbitrary node, denoted by r, and designate it as the root. Consider an arc 
(i,j) and suppose that j lies on the path from i to the root. Let ay = 1 and aji = 0. 
Finally, let a rr = 1, and an = for i ^ r. This corresponds to a Markov chain 
in which the state moves detcrministically towards the root. We have A = e r l T , 
for all t > n, where e» is the ith basis vector. It follows that p = 0, and T n (e) < n. 
However, this matrix A is not useful, because the corresponding vector of steady-state 
probabilities has mostly zero entries, which prohibits the scaling discussed in Section 
13.21 Nevertheless, this is easily remedied by perturbing the matrix A, as follows. For 
every € £ with i ^= j and ay = 0, let ay = S, where i5 is a small positive constant. 
For every i, there exists a unique j for which ay = 1. For any such pair we set 

a%j = 1 — X)fc=i a ik (which is nonnegative as long as 8 is chosen small enough). We 
have thus constructed a new matrix As which corresponds to a Markov chain whose 
transition diagram is a bidirectional spanning tree. Since the convergence rate p is an 



\\x{t) 
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eigenvalue of the iteration matrix, and eigenvalues are continuous functions of matrix 
elements, we see that, for the matrix Ag, the convergence rate p can be made as small 
as desired, by choosing 8 sufficiently small. Finally, since As is a positive matrix, the 
corresponding vector of steady-state probabilities is positive. 

To summarize, by picking 8 suitably small, we can choose a (stochastic) matrix 
Ag with arbitrarily favorable convergence rate, and which allows the application of 
the scaled agreement algorithm of Section 13.21 It can be shown that the convergence 
time is linear in the following sense: For every e, there exists some 8 such that, 
for the matrix Ag, the corresponding convergence time, denoted by T n (e;S), satisfies 
T n {c,8) < n. Indeed, this is an easy consequence of the facts lima_>o(^ — A ) = 
and T„(e'; 0) < n for every e 1 > ffl 

However, note that as n gets larger, mr, may approach at the non-root nodes. 
The implementation of the scaling in Eq. (|3.2[) will involve division by a number which 
approaches 0, possibly leading to numerical difficulties. Thus, the resulting averaging 
algorithm may be undesirable. Setting averaging aside, the agreement algorithm based 
on As, with 8 small is also undesirable: despite its favorable convergence rate, the 
final value on which consensus is reached is approximately equal to the initial value 
x r (0) of the root node. Such a "dictatorial" solution runs contrary to the motivation 
behind consensus algorithms. 

6.2. A lower bound. In order to avoid the numerical issues raised above, we 
will now impose a condition on the dominant (and positive) left eigenvector 7r of the 
matrix A, and require 

(6.1) rnii > —, V i 

where C is a given constant with C > 1. This condition ensures that niTi does not 
approach as n gets large, so that the initial conditions in the scaled agreement 
algorithm of Section 13.21 are well-behaved. Furthermore, in the context of consensus 
algorithms, condition (|6.ip has an appealing interpretation: it requires that the initial 
value Xi(0) of every node i has a nonnegligible impact on the final value lim^oo Xk(t), 
on which consensus is reached^. 

We will now show that under the additional condition (|6.1|) . there are graphs 
for which the convergence time is f2(n 2 log(l/e)). One may wonder whether a better 
convergence time is possible by allowing some of the entries of A to be negative. As 
the following result shows, negative entries do not help. The graph that we employ is 
a line graph, with arc set £ = \ \i — j\ < 1}- 

Theorem 6.1. Consider an n x n matrix A such that = whenever \i — j\ > 
1, and such that the graph with edge set € £ \ Cbij ^ 0} is connected. Let 

Ai, A2, . • be its eigenvalues in order of decreasing modulus. Suppose that Ai = 1 and 
Al = 1. Furthermore, suppose that there exists a vector ir satisfying Eq. \6.1\) such 



4 Indeed, it is easy to see that by suitably choosing the root, we can make sure that convergence 
time is at most \d(G)/2] where d(G) is the diameter of the graph G defined as the largest distance 
between any two vertices. 

5 In the case where A is the transition matrix of a reversible Markov chain, there is an additional 
interpretation. A reversible Markov chain may be viewed as a random walk on an undirected graph 
with edge- weights. Defining the degree of an vertex as the sum total of of the weights incident upon 
it, the condition nni > G is equivalent to requiring that each degree is lower bounded by a constant 
times the average degree. 
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that ir T A = ir T . Then, there exist absolute constants C\ and c-i such that 



C 



r n (e) >C2— log(l/e). 

Proof. If the entries of A were all nonnegative, we would be dealing with a birth- 
death Markov chain. Such a chain is reversible, i.e., satisfies the detailed balance 
equations 7r,a ? j = 7^-0^ (condition (c) in Theorem 14. ip . In fact the derivation of the 
detailed balance equations does not make use of nonnegativity; thus, detailed balance 
holds in our case as well. 

Without loss of generality, we can assume that X)"=i tti == L For i = 1, . . . , n, 
let yi = i — P, where (3 is chosen so that Y^i=i ^iVi ~ 0- We will make use of the 
inequality ()4.3j) . Since atj = whenever \i — j\ > 1, we have 

n n n n 

(6.2) n * a v(y* ~ Vif ^ = L 

i—l j — 1 i — l j — 1 

Furthermore, 

n 2 



it. -it iii -t 1 1> i 1 9 



C^ ai nC ^' -nC^V 2 J '~ 12C' 

i—l i—l i—l i—l 

The next to last inequality above is an instance of the general inequality E[(X — (3) 2 ] > 
var(X), applied to a discrete uniform random variable X . The last inequality follows 
from the well known fact var(X) = (n 2 — 1) / 12. Using the inequality (|4.3[) and Eqs. 
(|6.2|) - (|6.3j) . we obtain the desired bound on p. 

For the bound on T n (e), we let x(0) be a (real) eigenvector of A, associated with 
the eigenvalue A2, and proceed as in the end of the proof of Corollary 15. 21 □ 

Remark: Note that if the matrix A is as in the previous theorem, it is possible for 
the iteration x(t + 1) = Ax(t) not to converge at all. Indeed, nothing in the argument 
precludes the possibility that the smallest eigenvalue is —1, for example. In such a 
case, the lower bounds of the theorem - derived based on bounding the second largest 
eigenvalue - still hold as the convergence rate and time are infinite. 

6.3. Convergence time for spanning trees. We finally show that an 0(n 2 ) 
convergence time guarantee is easily obtained, by restricting to a spanning tree. 

Theorem 6.2. Consider the equal-neighbor, time-invariant, bidirectional model 
on a bidirectional spanning tree. We have 

and 

T n {e)=0(n 2 Iog(n/e)). 
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Proof. In this context, we have 7Tj = di/E, where E = J^ILi = 2(n— l)+n < 3n. 
(The factor of 2 is because we have arcs in both directions; the additional term n 
corresponds to the self-arcs.) As in the proof of Theorem 16.11 the detailed balance 
conditions 7ray = KjCiji hold, and we can apply Thcorcm l4.ll Note that Eq. (|4.2[) can 
be rewritten in the form 

(6.4) A 2 = 1 - i min V {x % - x 3 ) 2 . 

We use the methods of [23] to show that for trees, A2 can be upper bounded by 
1 — l/3n 2 . Indeed, suppose that x satisfies ^TJ™ diXi = and diX 2 = 1, and let 
i ma i be such that |x max | = max,; \xt\. Then, 



1 = dixf < 3nx 



2 

max' 



and it follows that |x max | > l/v3n. Without loss of generality, assume that x max > 
(else, replace each Xi by —Xi). Since ^TJj diXi = 0, there exists some i for which Xi < 0; 
let us denote such a negative Xi by x ncg . Then, 

(6-5) 1- — ^ ^max — ^ncg = (^'max — x ki) + ( x ki ~ Xk^ ) + ' ' p + (x-k r _ 1 ~ X ne g), 

y3n 

where k\, k%, . . . , fc r _i are the nodes on the path from x max to x nog . By the Cauchy- 
Schwartz inequality, 



(6-6) ^< \ (?i-Xjf- 

(The factor of 1/2 in the right-hand side arises because the sum includes both terms 
{x^ - x kt+1 ) 2 and (x ki+1 - x ki ) 2 .) Thus, 

X] ( Xi ~ x ^ 2 — 3^2 • 

{i,i)es 

which proves the bound for the second largest eigenvalue. 

For the smallest eigenvalue, recall that an > l/n for every i. It follows that the 
matrix A is of the form I /n + Q, where / is the identity matrix, and Q is a nonnegative 
matrix whose row sums are equal to 1 — l/n. Thus, all of the eigenvalues of Q have 
magnitude bounded above by 1 — l/n, which implies that the smallest eigenvalue of 
Q is bounded below by — 1 + l/n. We conclude that A„, the smallest eigenvalue of 
I/n + Q, satisfies 

2 2 

A„ > -1 + - > -1 + ^7. 

n n a 

For the bound on the convergence time, we proceed as in the proof of Corollary 
[5721 Let pij(t) be the (i, j)th entry of A*. Then, 



— TTj-j < Vn[l- - 



1 \ t 

1 -2 \ 



-n 

3 

For a suitable absolute constant c and for t > cn 2 log(n/e), we obtain \pij(t) — Tr(j)\ < 
e/n. The rest of the proof of Corollary 15.21 goes through unchanged. □ 
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In light of the preceding theorem, we propose the following simple heuristic, with 
worst-case convergence time 0(n 2 log(n/e)), as an alternative to a more elaborate 
design of the matrix A. 

Algorithm 6.3. We are given a symmetric graph G. We delete enough arcs to 
turn G into a bidirectional spanning tree, and then carry out the equal-neighbor, time- 
invariant, bidirectional consensus algorithm, with initial value Xj(0)/n7Tj at node i. 

Let us remark that the 0{n 2 log(n/e)) bound (Theorem 16. 2|) on the convergence 
time of this heuristic is essentially tight (within a factor of logn). Indeed, if the 
given graph is a line graph, then with our heuristic we have rvKi = ndi/E > 2/3, and 
Theorem 16.11 provides a fl(n 2 log(l/e)) lower bound. 

7. Convergence time when using a doubly stochastic matrix. We provide 
here a brief comparison of our methods with two methods that have been proposed 
in the literature, and which rely on doubly stochastic matrices. Recall that doubly 
stochastic matrices give rise directly to an averaging algorithm, without the need for 
scaling the initial values. 

(a) Reference [33] considers the case where the graph G is given, and studies the 
problem of choosing a doubly stochastic matrix A for which the convergence 
rate p is smallest. In order to obtain a tractable (semidefinite programming) 
formulation, this reference imposes the further restriction that A be sym- 
metric. For a doubly stochastic matrix, we have Hi = 1/n, for all i, so that 
condition (|6.ip holds with C = 1. According to Theorem 16.11 there exists a 
sequence of graphs, for which we have T n {e) = Q(n 2 log(l/e)). We conclude 
that despite the sophistication of this method, its worst case guarantee is no 
better (ignoring the logn factor) than the simple heuristic we have proposed 
(Algorithm l6.3p . On the other hand, for particular graphs, the design method 
of [33] may yield better convergence times. 

(b) The following method was proposed in [27j . The nodes first agree on some 
value e € (0, 1/max^ di). (This is easily accomplished in a distributed man- 
ner.) Then, the nodes iterate according to the equation 

n 

(7.1) Xi(t + l) = {l-edi)xi(t) + e ^ Xj (t). 

jGAT(i)\{i} 

Assuming a connected graph, the iteration converges to consensus (this is a 
special case of Theorem 12. 4|) . Furthermore, this iteration preserves the sum 
Xj(t). Equivalently, the corresponding matrix A is doubly stochastic, 
as required in order to have an averaging algorithm. 

This algorithm has the disadvantage of uniformly small step sizes. If many 
of the nodes have degrees of the order of n, there is no significant theoretical 
difference between this approach and our Algorithm l3.1[ as both have effective 
step sizes of order of 1/n. On the other hand, if only a small number of nodes 
have large degree, then the algorithm in [27] will force all the nodes to take 
small steps. This drawback is avoided by our Algorithms 13.11 (Section I3.3j) 
and 16.31 fSection 16. 3p . A comparison of the method of [57] with Algorithm 
13. H is carried out, through simulation experiments, in Section [8] 

8. Averaging with dynamic topologies. In this section, we turn our atten- 
tion to the more challenging case where communications are bidirectional but the 
network topology changes dynamically. Averaging algorithms for such a context have 
been considered previously in [53J [5S] . 
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As should be clear from the previous sections, consensus and averaging algo- 
rithms arc intimately linked, with the agreement algorithm often providing a foun- 
dation for the development of an averaging algorithm. For this reason, we start by 
investigating the worst-case performance of the agreement algorithm in a dynamic 
environment. Unfortunately, as shown in Section 18 - 1 1 its convergence time is not 
polynomially bounded, in general, even though it is an open question whether this is 
also the case when wc restrict to symmetric graphs. Motivated by this negative result, 
we approach the averaging problem differently: we introduce an averaging algorithm 
based on "load balancing" ideas (Section 18. 2p . and prove a polynomial bound on its 
convergence time (Section 18. 3[) . 

8.1. Non-polynomial convergence time for the agreement algorithm. 

We begin by formally defining the notion of "convergence time" for the agreement 
algorithm on dynamic graph sequences. Given a sequence of graphs G(t) on n nodes 
such that Assumption 12.31 of Section [5] is satisfied for some B > 0, and an initial con- 
dition x(0), we define the convergence time T G (.)(cc(0), e) (for this particular graph 
sequence and initial condition) as the first time t when each node is within an e- 
ncighborhood of the final consensus, i.e., \\x{t) — lim^oo a;(i) ||oo < e - We then define 
the (worst-case) convergence time, T n (B,e), as the maximum value of T G (.)(x(0), e), 
over all graph sequences G(-) on n nodes that satisfy Assumption 12.31 for that partic- 
ular B, and all initial conditions that satisfy ||x(0)||oo < 1- 

We focus our attention on the equal-neighbor version of the agreement algorithm. 
The result that follows shows that its convergence time is not bounded by a polynomial 
in n and B. In particular, if B is proportional to n, the convergence time increases 
faster than an exponential in n. We note that the upper bound in Theorem l8.1l is not 
a new result, but we include it for completeness, and for comparison with the lower 
bound, together with a proof sketch. Similar upper bounds have also been provided 
recently in [7], under slightly different assumptions on the graph sequence G(-). 

Theorem 8.1. For the equal-neighbor agreement algorithm, there exist positive 
constants c\ and C2 such that for every n, B, and 1 >e > 0, 



Proof. The upper bound follows by inspecting the proof of convergence of the 
agreement algorithm with the constant a of Assumption 12.11 set to 1/n (c.f. [30l |4]1. 

We now prove the lower bound by exhibiting a sequence of graphs G(t) and an ini- 
tial vector x(0), with ^(O)^ < 1 for which T G( .)(x(0), e) > c 1 nB(n/2) B ~ 1 log(l/e). 
Wc assume that n is even and n > 4. The initial condition x(0) is defined as Xi(Q) = 1 
for i = 1, . . . , n/2, and Xi(0) = — 1 for i = n/2 + 1, . . . , n. 

(i) The graph G(0), used for the first iteration, is shown in the left-hand side of 
Figure O 

(ii) For t = 1, . . . , B — 2, we perform an equal-neighbor iteration, each time using 
the graph G(t) shown in the right-hand side of Figure fSTTl 

(iii) Finally, at time B — 1, the graph G(B — 1) consists of the complete graph 
over the nodes {1, . . . ,n/2} and the complete graph over the nodes {n/2 + 



(iv) This sequence of B graphs is then repeated: G(t + kB) = G(t) for every 
positive integer k. 

It is easily seen that this sequence of graphs satisfies Assumption 12.31 and that con- 
vergence to consensus is guaranteed. 



(8.1) 




1, . . . ,n}. 
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n/2+1 n/2+2 





n/2+1 n/2+2 



Fig. 8.1. The diagram on the left is the graph G(0). The diagram on the right is the graph 
G(t) at times t = 1, . . . , B — 2. Self-arcs are not drawn but should be assumed present at every node. 



At the end of the first iteration, we have Xi(l) = 3^(0), for i ^ 1, n, and 



•2) 



(n/2) - 1 4 

Xi(l) = - — — = 1 



(n/2) + l n + 2 



Consider now the evolution of x\(t), for i = 1, . . . , B — 2, and let a(t) = 1 — x\(t). 
We have 

,. , ,x 1 ■ (1 - aft)) + (n/2 - 1) ■ 1 
+ 1) = — ^ = 1 - (2/n)a(i), 

so that a(i + 1) = 2a(t)/n. From Eq. (|8?2|) . a(l) = 4/(n + 2), which implies that 
a(B - 1) = (2/?i) s - 2 , or 



By symmetry, 



Xi(B- 1) = 1 



x n (B- 1) = -1 



4 /2\B-2 



n + 2 \n 



4 /2\B-2 



n + 2 \n 



Finally, at time B — 1, we iterate on the complete graph over nodes {1, . . . , n/2} 
and the complete graph over nodes {n/2 + 1, . . . , n}. For i = 2, . . . , n/2, we have 
Xi(B — 1) = 1, and we obtain 



- 1) = - = 1 



n/2 

Similarly, for i = (n/2) + 1, . . . , n, we obtain 



n + 2 \n 



Xi (B-l) = -1 



4 /2\ B -2 



n + 2 Vn 
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Thus, 



| maxj Xj{B) - min., Xj{B)\ _ ^ 4 (?L\ B1 
| maxj Xi(0) — mini £j(0)| n + 2 vn/ 

Moreover, because a;(i?) is simply a scaled version of x(0), it is clear that by repeating 
this sequence of graphs, we will have 

| max,; Xi(kB) - mim Xi(kB)\ f 4 f2\ B ~ 1 \ k 

| maxj Xi(0) — min^ a;,(0)| V n + 2 V?? / / 

This readily implies that 

T G{ . m (x(0), e) = n(»5Q) log ■ 

If 7i is odd, then n' = n — 1 is even. We apply the same initial condition and 
graph sequence as above to nodes {1, . . . , n'}. As for the additional node x n , we let 
x n {0) = and make extra connections by connecting node n to nodes 1 and n' at 
time with a bidirectional link. By repeating the analysis above, it can be verified 
that 

/ fn — 1\ B ~ 1 In 
T o( . )(t) (*(0),e)=fi(nfl(— J log-). 

This concludes the proof. □ 

Both the upper and lower bounds in Theorem 18.11 display an exponential growth 
of the convergence time, as a function of B. It is unclear, however, which of the two 
terms, n B or n nB , better captures the behavior of T n (B, e). 

8.2. Polynomial-time averaging in dynamic topologies. The algorithm we 
present here is a variation of an old load balancing algorithm (see [5] and Chapter 7.3 
of [3]). Intuitively, a collection of processors with different initial loads try to equalize 
their respective loads. As some of the highly loaded processors send some of their 
load to their less loaded neighbors, the loads at different nodes tend to become equal. 
Similarly, at each step of our algorithm, each node offers some of its value to its 
neighbors, and accepts or rejects such offers from its neighbors. Once an offer from 
i to j to send 6 has been accepted, the updates Xi := x% — S and Xj := Xj + S are 
executed. 

We assume a time- varying sequence of graphs G(t). We only make two assump- 
tions on G(-): symmetry and bounded interconnectivity times (see Section [5] for def- 
initions). The symmetry assumption is natural if we consider, for example, commu- 
nication between two nodes to be feasible whenever the nodes are within a certain 
distance of each other. The assumption of bounded interconnectivity times is neces- 
sary for an upper bound on the convergence time to exist (otherwise, we could insert 
infinitely many empty graphs G(t), in which case convergence is arbitrarily slow for 
any algorithm). 

We next describe formally the steps that each node carries out at each time t. 
For definiteness, we refer to the node executing the steps below as node A. Moreover, 
the instructions below sometimes refer to the "neighbors" of node A; this always 
means nodes other than A that are neighbors at time t, when the step is being 
executed (since G(t) can change with t, the set of neighbors of A can also change). 
Let Mi{t) = {j i : G £(t)}. Note that this a little different from the definition 
of J\fi(t) in earlier sections, in that i is no longer considered a neighbor of itself. 
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Algorithm 8.2. If AOi(i) is empty, node A does nothing at time t. Else, node 
A carries out the following steps. 

1. Node A broadcasts its current value xa to all of its neighbors (every j with 

2. Node A finds a neighboring node B with the smallest value: xb = min{x^ : 
j G A/V(i)}- If a;^ < xb, then node ^4 does nothing further at this step. If 
xb < xa, then node A makes an offer of (xa — xb)/2 to node B. 

3. If node A does not receive any offers, it does nothing further at this step. 
Otherwise, it sends an acceptance to the sender of the largest offer and a 
rejection to all the other senders. It updates the value of xa by adding the 
value of the accepted offer. 

4. If an acceptance arrives for the offer made by node A, node A updates xa by 
subtracting the value of the offer. 

For concreteness, we use Xi(t) to denote the value possessed by node i at the 
beginning of the above described steps. Accordingly, the value possessed by node i at 
the end of the above steps will be Xi(t + 1). The algorithm we have specified clearly 
keeps the value of X)"=i x iif) constant. Furthermore, it is a valid averaging algorithm, 
as stated in Theorem 18.31 below. We do not provide a separate proof, because this 
result follows from the convergence time bounds in the next subsection. 

Theorem 8.3. Suppose that each G(t) is symmetric and that Assumption \2.S\ 
(bounded interconnectivity times) holds. Then, lim^oo Xi(t) = i Y^k=i x k(0), for 
all i. 

8.3. Convergence time. Wc introduce the following "Lyapunov" function that 
quantifies the distance of the state x(t) of the agents form the desired limit: 



1 ™ 

V(t)= x(t) - -^(0)1 



n 

i=l 



Intuitively, V(t) measures the variance of the values at the different nodes. Given 
a sequence of graphs G(t) on n nodes, and an initial vector x(0), we define the 
convergence time Tq^(x(0), e) as the first time t after which V(-) remains smaller 
than eV(0): 

T G( .)(a;(0),e) = min{i | V{t) < eV(0), Vt>(}. 

We then define the (worst-case) convergence time, T n (B, e), as the maximum value of 
T G (.)(x(0), e), over all graph sequences G(-) on n nodes that satisfy Assumption 12.31 
for that particular B, and all initial conditions x(0). 

Theorem 8.4. There exists a constant c > such that for every n and 1 >e > 0, 
we have 



(8.3) TJB,e) < cBn 3 log-. 

e 

Proof. The proof is structured as follows. Without loss of generality, we assume 
that X)"=i x «(0) = 0; this is possible because adding a constant to each Xi does not 
change the sizes of the offers or the acceptance decisions. We will show that V(t) is 
nonincrcasing in t, and that 

(8.4) V((k + l)B) < (l - ^)v(kB) 

17 



for every nonncgative integer k. These two claims readily imply the desired result. To 
see this, note that if V(t) decreases by a factor of 1 — (l/2n 3 ) every B steps, then it 
decreases by a 0(1) factor in Br? steps. It follows that the time until V{t) becomes 
less than eV(0) is 0(Bn 3 log(l/e)). Finally since V(t) is nonincreasing, V(t) stays 
below eV(0) thereafter. 

We first show that V(t) is nonincreasing. We argue that while rejected offers 
clearly do not change V(t), each accepted offer at time t results in a decrease of 
V(t + 1). While this would straightforward to establish if there were a single accepted 
offer, a more complicated argument is needed to account for the possibility of multiple 
offers being simultaneously accepted. We will show that we can view the changes at 
time t as a result of a series of sequentially accepted offers, each of which results in a 
smaller value of V . 

Let us focus on a particular time t. We order the nodes from smallest to largest, 
so that xi(t) < x-2.it) < • • • < x„(t), breaking ties arbitrarily. Let Ai(t) be the size of 
the offer accepted by node i at time t (if any). If the node accepted no offers at time 
t, set Ai(t) = 0. Furthermore, if Ai(t) > 0, let Ai(t) be the the index of the node 
whose offer node i accepted. 

Let us now break time t into n periods. The ith period involves the updates caused 
by node i accepting an offer from node Ai(t). In particular, node i performs the update 
Xi(t) := Xi(t)+Ai(t) and node Ai(t) performs the update x^./ t ^(t) := x^u^t) — Ai(t). 

We note that every offer accepted at time t appears in some period in the above 
sequence. We next argue that each offer decreases V. This will complete the proof 
that V(t) is nonincreasing in t. 

Let us suppose that in the ith period, node i accepts an offer from node Ai(t), 
which we will for simplicity denote by j. Because nodes only send offers to lower 
valued nodes, the inequality Xj > Xi must hold at the beginning of time t, before 
time period 1. We claim that this inequality continues to hold when the ith time 
period is reached. Indeed, Xj is unchanged during periods 1, ... ,i — 1 (it can only 
send one offer, which was to xf, and if it receives any offers, their effects will occur in 
period j, which is after period i). Moreover, while the value of Xi may have changed 
in periods 1, . . . ,i — 1, it cannot have increased (since i is not allowed to accept more 
than one offer at any given time t). Therefore, the inequality xj > Xi still holds at 
the beginning of the ith period. 

During the ith period, a certain positive amount is transferred from node j to 
node i. Since the transfer takes place from a higher- valued node to a lower- valued 
one, it is easily checked that the value of x\ + X? (which is the contribution of these 
two nodes to V) is reduced. To summarize, we have shown that we can serialize the 
offers accepted at time t, in such a way that each accepted offer causes a reduction in 
V. It follows that V(t) is nonincreasing. 

We will now argue that at some time t in the interval 0, 1, . . . , B — 1, there will 
be some update (acceptance of an offer) that reduces V(t) by at least l/(2n 3 )V(0). 
Without loss of generality, we assume max^ |x,(0)| = 1, so that all the values lie in 
the interval [-L+1]. It follows that V(0) < n. 

Since Y^i=x x i(^) = 0j ^ follows that miiijXi(O) < 0. Hence, the largest gap 
between any two consecutive a;,:(0) must be at least 1/n. Thus, there exist some 
numbers a and 6, with b — a > 1/n, and the set of nodes can be partitioned into two 
disjoint subsets S~ and 5+ such that Xi(0) < a for all i £ S-, and 2^(0) > b for 
all i G S+. By Assumption 12.31 the graph with arcs IJ S=0 B _ 1 f(s) is connected. 
Thus, there exists a first time r € {0, 1, . . . , B — 1} at which there is a communication 

18 



between some node i G S_ and some node j G S + , resulting in an offer from j to i. 
Up until that time, nodes in 5*_ have not interacted with nodes in S + . It follows that 
Zk(j~) < a for all k G S 1 -, and Xfe(r) > 6 for all fc G 5+. In particular, Xi(r) < a and 
%j( T ) > There arc two possibilities: cither i accepts the offer from j, or i accepts 
some higher offer, from some other node in S+. In either case, we conclude that there 
is a first time r < -B — 1, at which a node in <S_ accepts an offer from a node in S+. 

Let us use plain x% and Xj for the values at nodes i and j, respectively, at the 
beginning of period i of time r. At the end of that period, the value at both nodes is 
equal to (xi + xj)/2. Thus, the Lyapunov function V decreases by 



At every other time and period, V is nonincreasing, as shown earlier. Thus, using the 
inequality V(0) < n, 



By repeating this argument over the interval kB, . . . , (k + 1)B, instead of the interval 
0, . . . , B, we establish Eq. (|8.4[) . which concludes the proof. □ 

9. Simulations. We have proposed several new algorithms for the distributed 
consensus and averaging problems. For one of them, namely the spanning tree heuris- 
tic of Section l6~3l (Algorithm [63]), the theoretical performance has been characterized 
completely — see Theorem 16.21 and the discussion at the end of Section 16.31 In this 
section, we provide simulation results for the remaining two algorithms. 

9.1. Averaging in fixed networks with two passes of the agreement 
algorithm. In Section 13. 3[ we proposed a method for averaging in fixed graphs, 
based on two parallel executions of the agreement algorithm (Algorithm 13. ip . We 
speculated in Section [7] that the presence of a small number of high degree nodes 
would make the performance of our algorithm attractive relative to the algorithm of 
[27] . which uses a step size proportional to the inverse of the largest degree. (Our 
implementation used a step size of e = l/2d max .) Figure ED presents simulation 
results for the two algorithms. 

In each simulation, we first generate geometric random graph G(n, r) by placing 
nodes randomly in [0, l] 2 and connecting two nodes if they are at most r apart. We 
pick r = 6(\/logJi/n), which is a standard choice for modeling wireless networks (c.f. 



We then change the random graph G(n, r) by picking nodes at random (n^ = 
10 in both figures) and adding edges randomly to make the degree of these nodes linear 
in n; this is done by making each edge incident to these nodes present with probability 
1/3. We run the algorithm, with random starting values, uniformly distributed in 
[0, 1], until the largest deviation from the mean is at most e = 1CP 3 . 

Each outcome recorded in Figure l9Tl (for different values of n), is the average of 
three runs. We conclude that for such graphs, the convergence time of the algorithm 
in [27j grows considerably faster than the one proposed in this paper. 

9.2. Averaging in time-varying Erdos-Renyi random graphs. We report 
here on simulations involving the load-balancing algorithm (Algorithm 18. 2|) on time- 
varying random graphs. In contrast to our previous simulations on static geometric 
graph, we test two time- varying models which simulate movement. 



x\ + x) - 2 (^±^) 2 = \{ Xi - Xj f > \{b - af 



1 



" 2n 2 ' 
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Fig. 9.1. On t/ie Ze/t: comparing averaging algorithms on a geometric random graph. The 
top line corresponds to the algorithm of \27j , and the bottom line (close to the horizontal axis) 
corresponds to using two parallel passes of the agreement algorithm {Alaorithm \3. li) . On the right: 
a blowup of the performance of the agreement algorithm. 



In both models, we select our initial vector x(0) by choosing each component 
independently as a uniform random variable over [0, 1]. In our first model, at each 
time t, we independently generate an Erdos-Renyi random graph G(t) = G(c, n) 
with c = 3/4. In the second model, at each time step we independently generate 
a geometric random graph with G(n, r) with r = ^/log n/n. In both models, if the 
largest deviation from the mean is at most e = 10~ 3 , we stop; else, we perform another 
iteration of the load-balancing algorithm. 

The results are summarized in Figure HOI where again each point represents the 
average of three runs. We conclude that in these random models, only a sublinear 
number of iterations appears to be needed. 



300 20 



Fig. 9.2. On the left: averaging in time-varying Erdos-Renyi random graphs with the load 
balancing algorithm. Here c = 3/4 at each time t. On the right: averaging in time-varying 
geometric random graphs with the load balancing algorithm. Here r = i^/log n/n. 
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10. Concluding remarks. In this paper we have considered a variety of con- 
sensus and averaging algorithms, and studied their convergence rates. While our 
discussion was focused on averaging algorithms, several of our results pertain to the 
closely related consensus problem. 

For the case of a fixed topology, we showed that averaging algorithms are easy 
to construct by using two parallel executions of the agreement algorithm for the 
consensus problem. We also saw that a reasonable performance guarantee can be 
obtained by using the equal-neighbor agreement algorithm on a spanning tree, as 
opposed to a more sophisticated design. 

For the case of a fixed topology, the choice of different algorithms is not a purely 
mathematical issue; one must also take into account the extent to which one is able to 
design the algorithm offline and provide suitable instructions to each node. After all, if 
the nodes are able to set up a spanning tree, there are simple distributed algorithms, 
involving two sweeps along the tree, in opposite directions, with which the sum of 
their initial values can be computed and disseminated [3], thus eliminating the need 
for an iterative algorithm. On the other hand, in less structured environments, with 
the possibility of occasional changes in the system topology, iterative algorithms can 
be more resilient. For example, the equal-neighbor agreement algorithm adjusts itself 
naturally when the topology changes. 

In the face of a changing topology (possibly at each time step), the agreement 
algorithm continues to work properly, under minimal assumptions (Theorem l2. 4p . On 
the other hand, its worst-case convergence time may suffer severely (cf. Section 18.11) . 
Furthermore, it is not apparent how to modify the agreement algorithm and obtain 
an averaging algorithm without sacrificing linearity and/or allowing some additional 
memory at the nodes. In Section [5J wc introduced an averaging algorithm, which is 
nonlinear but leads to a rather favorable (and in particular, polynomial) convergence 
time bound. In view of the favorable performance observed in our simulation results, 
it would also be interesting to characterize the average performance of this algorithm, 
under a probabilistic mechanism for generating the graphs G(i), similar to the one in 
our simulations. 

Something to notice about Algorithm 18.21 is that it requires the topology to re- 
main fixed during each during the exchange of offers and acceptances/rejections that 
happens at each step. On the other hand, without such an assumption, or without 
introducing a much larger memory at each node (which would allow for flooding of 
individual values), an averaging algorithm may well turn out to be impossible. 
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